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INTERPOLATION OF ORDINATES AND AREAS 
AMONG AREAS. 

By C. H. Forsyth, Dartmouth College. 



Ordinary interpolation consists geometrically of adding one 
or more ordinates in a systematic manner to a given series of 
ordinates. The intervals which separate the given ordinates 
may be equal or unequal. 

Ideal interpolation requires that a large number of the given 
ordinates shall be used to determine the value of the ordinate 
or ordinates to be interpolated in order that whatever sense 
of uniformity exists between the values of the given ordinates 
shall be maintained after the interpolation. The degree of 
reliability of the given ordinates will usually decide roughly 
how many of them should be used to give appropriate results. 

The purpose of this paper is to derive a formula to be used 
to interpolate areas as well as ordinates among areas in as 
direct and systematic a manner as ordinary interpolation is 
and has been performed; however, the areas or ordinates to 
be interpolated are parts of the given areas and the given 
areas must be contiguous and of equal breadth. 

The use of such a formula can be best explained by a concrete 
illustration, and the best illustration is to be found in con- 
nection with mortality or population statistics, especially the 
latter because of the great difficulty encountered in attempting 
to gather such statistics for individual ages. The tendency 
of human beings to state their ages in figures ending with a 
"5" or more often a "0" leads to such concentration of data 
at such ages as to render the results worthless in the form 
collected. For this reason such statistics are now usually col- 
lected in age groups of 5, 10, or some other number of years 
and then whenever the values for the individual ages are 
desired they are obtained by interpolation. The method of 
interpolation used in the past has been very indirect, involv- 
ing considerable transformation of the data previously. Other 
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problems will readily suggest themselves when once the pur- 
pose of the formula is clear. 

To illustrate then, suppose the population of a community 
to be given by the following hypothetical age groups: 

Ages. Population. 

10-14 362 

15-19 586 

20-24 938 

If we should then draw a curve in silch a way that ordinates 
dropped at equal intervals would enclose areas which would 
correspond to the populations given above and in the same 
order these areas might be thought of as being made up of 
similar areas which would correspond to the populations for 
the individual ages. Thus the formula or formulas to be de- 
rived may be used to determine either the populations corre- 
sponding to individual ages (ordinates) or for any group of ages 
(areas) which is a fractional part of one of the original groups. 

The derivation and application of the formulas to be derived 
are based upon the calculus of finite differences; however, 
the theory involved in the application is so simple that any- 
one can apply the formulas by simply following the examples 
given in the paper. 

If we define the two functions Ux and yx by the relation 

t t 

Ayx = ux= yx+i — yx 
t ~t t t 

x-l 

whence yx= 2 ux+yo=uo+ui+ . . . +ux-i+yo (1) 

* *=o 1 t , t t t t 

and if we expand yx+i and yx by the so called Newton's formula 
t 7 

in terms of yo(=y ) and its differences, where 
t 
. x . , x(x-t)A 2 y . x(x-t)(x — 2t)A 3 y<> 

y r y + -Ay +-^ — + — + • 

and 

_i_ a! + 1 A« _ L (g+i)(»+i-fl A'yo_ L 

yx±i = y -\ —Ay + — + 

(x+l)(x+l-f)(x+l-2t)A 3 yo 
f ' 3! 



420 



American Statistical Association. 



[76 



we may write 

_Ay 



(2) 



Ux- 
1 t 

(2) 



+AF(x,t) 



A 2 j/ 
fi2l 



-AF(i5)— ° + 
Vi3! 



(2) 



where F(x,i) refers to the factorial so often found in connec- 
tion with finite differences and equals x(x— t) 

(3) 

F(x,t)=x(x-t)(x-2t) 
etc. 
If we let A = x 

B=x-t 
C=z-2t 
etc., 



then 



etc. 



(2) 

AF(x,t)=A+B+l 

(3) 

AF(x,t) =AB+AC+BC+A+B+C+1 



Formula (2) is the fundamental formula desired but the 
various terms on the right hand side may be shown to have sig- 
nificant interpretations and meanings. 



If we define w*=ux+ux+i+ 



-\-U x+t-l 

t 



then, by (1), 






2/o = 2/o 


Wo+Wl . . 

t t 


. +ut-i +2/o=2/l =2/i = Wo +2/o 
t t 


W-\-Ul . 

t t 


. . +w2(-i+2/ = 2/ 2 i = 2/2=tt>o+u>i+2/o 

t t 




2/3 = Wo+m>i+w>2+2/o 




etc. 


Hence 


Ay = yi— yo = w 




Ayi=yz-yi=wt 


and 


A 2 y =Ayi-Ay =Aw 


Likewise 


A 3 y = A 2 w 




A*yo = A 3 Wo 




etc., 


and (2) may be written 



(2) 



Awo 



{3 l A 2 w 



u*=^ +AF(x,t) ^ +AF(x,t) ^ + 
i t < 2 2! thSl 

vhere it is easy to show that 



(3) 
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(2) 

AF(x,t)=2x-t+l 

(3) 

AF(x,t) =3z 2 +3a;(l -20 +(1 -Zt+2t 2 ) 

(4) 

AF(x,t)=4x i +Qx i (l-Zt)+2x(2-9t+llt 2 ) + (l-Qt+llt 2 -Qf) 
etc. 

Now, the values w , Wi, Wz . . . are obviously groups of t 
values, just as we find in mortality and population statistics, 

and ux represents one of these values included in a group, its 

7 
identification depending upon the value of x. As an illustra- 
tion let us take the illustration given at the beginning of this 
paper. Here t = 5, and wo, u\, etc., represent the populations 

6 5 

at ages 10, 11, etc., and uo-\-ui+ . . . +w4=m> =362, etc. 



t 5 5 



Thus 



m>o=362 



Aw = 224 
m>i = 586 A 2 u> = 128 

A«>i = 352 
M> 2 =938 
and the population at age, say 15, according to (3) is 

Wo . o Aw .A 2 ^ , 
W5 = |-o ■— 4- 



362 224 128 to second differences 

= [-3 4 — + 

5 25 125 

= 95.184 or 95 

Since population and deaths are usually given in quinquen- 
nial age groups t usually has the value "5." In this case (3) 
may be written 

U* = .2w + Mix - 2) Aw + .004(z 2 - 9x+ 12) A 2 w„ + 

.ooosc^^i^+iiez-^e)^ 



3 



.A 4 w 



+.00004(x 4 -38a; 3 +467z 2 -2014a;-|-1915.2)=-^+ ... (4) 

although we believe it is rarely necessary to use differences 
beyond the second in such a connection. Of course, in case 

6 
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the formula is needed to only a lower number of differences, 
the higher differences of the formula are simply ignored. 

If x be given the value "7," formula (4) up to and including 
second differences becomes identical with a formula used by 
Mr. George King in his method of constructing abridged 
mortality tables as explained in the report of the English Reg- 
istrar General for 1914. 

So far we have pointed out how formula (3) may be used to 
determine for example the population for a single age. As a 
rule, in such a connection, the populations for each and all of 
the individual ages are desired and in such a case the work 
may be simplified by using not formula (3) or (4) but the lead- 
ing term and differences obtained from it when x is given suc- 
cessively the values -, - . . . — and the results differenced 
5 5 5 

five times. The leading term and differences given in workable 
form are as follows : 

Leading term .2w + .12Aw - .032A 2 w + .0144A 3 w 
1st difference .04 " +.008 " -.0064 " 

2d difference .008 " -.0016 " 

3d difference .0016 " (A) 

4th and higher differences. 

As an illustration let us take our original but now somewhat 
extended problem and break up the population for the age 
group (15-19) into populations for each and all of the individ- 
uals included in the group. 
Ages. Populations. 

10-14 wo = 362 

Aw = 224 
15-19 586 A 2 w = 128 

352 A 3 w = 105 

20-24 938 233 

585 
25-29 1523 

When the leading term and differences (the underscored 
values) are computed, written as shown, and added successively 
in the ordinary way, we have the following results (shown with 
the other values occurring in connection with the addition) : 
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Ages. Populations. 

15 96.696 97 

9.312 

16 106.008 ^856 106 

10.168 .168 

17 116.176 1.024 or 116 

11.192 .168 

18 127.368 1.192 127 

12.384 

19 139 . 752 140 

586.000 586 

In this illustration the work was carried out to an unnec- 
essary number of decimal places in order to show a perfect 
check on the work. Usually one or two decimals will prove 
sufficient, depending upon the size of the values involved. 

As a general rule mortality and population statistics, even 
at their best, are faulty and hence in most cases all third differ- 
ences may be ignored in which case the leading term and differ- 
ences may be taken as 

Leading term .2 w + ■ 1 2 Aw<> — .032 A 2 w 

1st difference .04 " +.008 " 

2d difference .008 " (A') 

3d and higher differences 

It will be noticed that the formulas of leading term and 
differences given above are applied centrally, i. e., at least 
one value must be known on each side of the value to be 
broken up, and hence the formulas can be used in all places 
except to break up "end" values. 

The leading term and differences for breaking up "end" 

1 5 

values are obtained by setting x = -, -, . . . - successively 

5 5 5 

in formula (4) and differencing the results five times and are 

as follows: 

Leading term .2w - .08Am> + .048A 2 m> - .0336A% 

1st difference .04 " -.032 " +.0256 " 

2d difference .008 " -.0096 " 

3d difference .0016 " (B) 

4th and higher differences 
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Thus, the leading term and differences (B) may be used to 
break up "end" values such as population "362" for the age 
group (10-14) given above, as follows. However, in this case, 
only second differences will be used in order to show clearly 
how the work is finally performed. 
Ages. Populations. 

10 60.624 

4.864 
11 65.488 1.024 

5.888 

12 71.376 1.024 

6.912 

13 78.288 1.024 

7.936 
14 86.224 

362.000 

Here, again, an unnecessary number of decimal places are 
carried in order to show a perfect check on the work. 

Of course, in breaking up the other of the two "end" values 
occurring in any set of statistics the column of values needs 
only to be reversed and the formulas applied in the same way. 
In all cases any negative signs that appear must be carefully 
preserved and carried along with the work. 

The formulas derived above will prove sufficient to cover 
practically all cases met with in ordinary work of this kind; 
but, looked at from a geometrical standpoint, the formulas 
derived so far can be used to interpolate only what might be 
regarded as ordinates among areas. To make our paper 
anywise complete we should include a formula which would 
allow us in case of necessity to determine, say the population 
which would correspond to some fraction of a particular age 
group such as the first 3/5, or the middle 2/10 and, in fact, any 
fractional part of any group. Such a formula would then 
be used to interpolate what might be regarded as areas. 

To establish a formula for determining the value which 

would correspond say to any - part of a given group, it is 

ft 
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obviously necessary to merely determine the value of the sum. 

Unx = Ux -\-Ux+l + • • • +U x+n-l 
t t t t 

in terms of the formula (3), where, to recapitulate, Unx gives 

t 

n 
the value which corresponds to any - part of the group with 

6 

which - is connected, where 
t 

-,-... fall in the first group, 

-, . . . fall in the second group, etc. 

t t t 

The summation suggested above is so easily performed by 
finite integration that merely the final result (to second differ- 
ences) is given, as follows: 

U„ = ™>+A**+B**+ ... (5) 

T * t\2\ « 3 3! 

where A=n(2x+n — t) 

B=n{Zx 2 +Zx(n-2t) + {n-t)(n-2t)} 
etc. 

Suppose, as an illustration, we desire to determine the popu- 
lation of the first 3/5 part of the quinquennial. group (15-19) 
of the example given in this paper, using second differences. 
Here t = x = 5 and n=3, or 

Unx = 3/5wo+ 12 8 

~t 25 125 

=316.528 or 317 

which will be found to be the same as the sum of 95.184, 
105.168 and 116.176 found separately using second differ- 
ences. 



